2024-07-22 08:41:14.AIbase.10.5k
Open-Source Vision-Language Representation Learning Model RWKV-CLIP
DeepSeek has open-sourced the RWKV-CLIP model, a visual-language representation learner that combines the strengths of both Transformer and RNN architectures. The model has significantly enhanced performance on vision and language tasks by pre-training on image-text pairs sourced from web data, expanding its dataset with information obtained from various websites.To address the issue of noisy data and improve data quality, the research team introduced a diverse description generation framework.